mitigate scene bias
Reviews: Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition
Strengths of the paper are listed as follows: S1. The paper tackles the important problem of scene de-biasing for action recognition. It is of high concern for computer vision community to sanity check whether the proposed models (really) learn the dynamics of actions, and not just learn to leverage spurious bias such as the co-occurrence of the scene between actions. The authors develop a sensible solution, forcing the model to consider the human region for recognition, trying to reduce the sensitivity of action representation to the surrounding context. This is achieved by borrowing ideas from adversarial learning, that is, the scene recognition ability of action code is altered by directly using gradient reversal [8], a well-known domain confusion method in the literature since 2015.
Reviews: Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition
The initial scores for this paper were: 4: An okay submission, but not good enough; a reject. The main concerns of the negative reviewers were: - issues about the problem formulation - only weak baselines are considered; results below the state-of-the-art - limited novelty - missing citations - only relatively minor improvements obtained by the proposed approach The positive reviewer also acknowledges the issues with experimental evaluation (the proposed method is shown to help weak baselines that are overall below the state-of-the-art), but finds the idea of the paper interesting, original and standing out. The authors provide a rebuttal. In the follow-up discussion among the reviewers, R3 acknowledges that some of their concerns have been addressed but remains borderline negative (5) as they think the rebuttal does not alleviate the concerns regarding the overall low results and some ablations are still missing. R2 agrees on the issues with experimental evaluation pointed by R1 R3 but maintains that "given that the problem and the method are interesting and that there are no good dataset to study them, I would recommend accept."
Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition
Human activities often occur in specific scene contexts, e.g., playing basketball on a basketball court. The learned representation may not generalize well to new action classes or different tasks. In this paper, we propose to mitigate scene bias for video representation learning. Specifically, we augment the standard cross-entropy loss for action classification with 1) an adversarial loss for scene types and 2) a human mask confusion loss for videos where the human actors are masked out. These two losses encourage learning representations that are unable to predict the scene types and the correct actions when there is no evidence.